Phenome of coeliac disease vs. inflammatory bowel disease

Coeliac disease (CeD) is characterized by gliadin-induced intestinal inflammation appearing in genetically susceptible individuals, such as HLA-DQ2.5 carriers. CeD, as well as other chronic intestinal disorders, such as Crohn's disease (CD) and ulcerative colitis, has been associated with increased morbidity and mortality, but the causes are unknown. We systematically analysed CeD-associated diagnoses and compared them to conditions enriched in subjects with CD/UC as well as in HLA-DQ2.5 carriers without CeD. We compared the overall and cause-specific mortality and morbidity of 3,001 patients with CeD, 2,020 with CD, 4,399 with UC and 492,200 controls in the community-based UK Biobank. Disease-specific phenotypes were assessed with the multivariable Phenome Wide Association Study (PheWAS) method. Associations were adjusted for age, sex and body mass index. All disease groups displayed higher overall mortality than controls (CD: aHR = 1.91[1.70–2.17]; UC: aHR = 1.32 [1.20–1.46]; CeD: aHR = 1.38 [1.22–1.55]). Cardiovascular and cancer-related deaths were responsible for the majority of fatalities. PheWAS analysis revealed 166 Phecodes overrepresented in all three disorders, whereas only ~ 20% of enriched Phecodes were disease specific. Seven of the 58 identified CeD-specific Phecodes were enriched in individuals homozygous for HLA-DQ2.5 without diagnosed CeD. Four out of these seven Phecodes and eight out of 19 HLA-DQ2.5 specific Phecodes were more common in homozygous HLA-DQ2.5 subjects with vs. without CeD, highlighting the interplay between genetics and diagnosis-related factors. Our study illustrates that the morbidity and mortality in CeD share similarities with CD/UC, while the CeD-restricted conditions might be driven by both inherited and acquired factors.


Results
Overall-and cause-specific mortality of analysed intestinal diseases. Among the 502,488 individuals recruited to UKB, 3001 were diagnosed with coeliac CeD, 4,399 with UC, and 2020 with CD, while 868 subjects with more than one diagnosis of CeD, UC and CeD were excluded.
A total of 492,200 individuals had none of these diagnoses and were used as a reference group ( Supplementary  Fig. S1A). All subgroups had a similar age distribution at baseline, while CeD subjects had the highest proportion of women and the lowest average body mass index (Table 1).
In all three intestinal disorders, cancer and cardiovascular diseases were responsible for the majority of fatalities (63-67%), whereas digestive diseases accounted for only 6-12% of deaths, with the lowest rate in CeD. Cancer-and digestive disease-specific mortality was elevated in all disorders, while cardiovascular and respiratory mortality was increased in CeD and CD but not UC subjects (Table 2). Overall, the pattern of disease-specific mortalities was similar in all three cohorts.
PheWAS analysis reveals disease-specific phenotypes. To decipher whether the observed comorbidities represent a conserved response to chronic intestinal injury or whether they constitute unique, diseasespecific traits, we performed multivariable PheWAS analysis comparing the different intestinal diseases with the reference group (Fig. 1). Among all investigated intestinal disorders, an enrichment of Phecodes for gastrointestinal, haematopoietic and metabolic diseases, symptoms, and complications was observed. Osteoporosis, anaemia, and dermatological Phecodes were more prominent in patients with CeD than in patients with IBD (Fig. 1). The phenotypic overlap between the intestinal disorders became apparent in the Venn diagram, which revealed 166 Phecodes shared between all three disorders (Fig. 2). The number of shared Phecodes between CD and UC was significantly greater than that between CeD and one of the two inflammatory bowel disease subgroups (CeD and CD: 28; CeD and UC: 32; CD and UC: 68). The number of Phecodes specific to each disease was comparable for CeD and UC (CeD: 58, UC: 59), while CD had the highest number of unique Phecodes (CD: 79) (Fig. 2). Notably, only 18-23% of associated Phecodes were disease specific, while the others were shared by at least two disorders and 49-58% by all three disorders (Fig. 2). CeD-specific Phecodes (Fig. 3). Among them, Phecodes of well-established CeD-associated dermatologic manifestations such as "dermatitis herpetiformis" (OR = 122.2) and "bullous dermatoses" (OR = 23.6) displayed the highest odds ratios. Notably, Phecodes of autoimmune diseases were particularly prominent. They included diabetes and thyroid disorders (Fig. 3) and displayed odds ratios ranging from 3.6 ("other disorders of thyroid") to 14.3 ("other immunological findings"). Several cardiovascular and ocular disorders were also enriched in patients with CeD, with odds ratios of approximately 2. The only Phecodes coding for malignancy that were compared with the reference group without these diagnoses. The ten most significant associations are shown. Upwards/downwards pointing triangles refer to Phecodes that are over/underrepresented. The black line indicates the significance level after Bonferroni adjustment for multiple testing. All analyses were adjusted for sex, age, and body mass index, and p values are displayed in a − log10 format. NOS not otherwise specified. www.nature.com/scientificreports/ significantly enriched in patients with CeD were "pancreatic cancer" (OR = 3.1), "non-Hodgkin´s lymphoma" (OR = 2.0), and "malignancies of other lymphatic tissue" (OR = 1.9). As potential reasons for the relatively high respiratory disease-specific mortality, CeD individuals were more likely to harbour "other alveolar and parietoalveolar pneumonopathy" as well as "empyema and pneumothorax". Finally, gastrointestinal diseases played only a minor role among the CeD-specific Phecodes. In contrast, digestive and genitourinary conditions were more commonly seen among CD-/UC-specific Phecodes, while infectious, autoimmune and ocular codes were relatively rare (Supplementary Figs. S3, S4). Moreover, digestive disorders were highly enriched among conditions that are overrepresented both in UC and CD but not CeD individuals (18 out of 68 Phecodes). Notably, the latter subgroup also contained 9 infection-related Phecodes (Supplementary Table S3). Since Phecodes of autoimmune diseases constituted the most frequent disease spectrum enriched in subjects with CeD, we analysed whether autoimmune Phecodes were significantly associated with UC and CD as well. The corresponding Venn diagram revealed that five Phecodes were overrepresented in all intestinal disorders, with all of them being related to rheumatoid arthritis and psoriasis. One and two unique autoimmune Phecode(s) were more enriched in CD and UC individuals, respectively, compared to 13 in CeD subjects (Fig. 4).

HLA-DQ2.5-dependent Phecodes and the impact of CeD diagnosis on the appearance of HLA-specific Phecodes.
As CeD is highly associated with specific HLA genotypes, we studied whether the underlying genetic background may contribute to the observed CeD-specific Phecodes. To this end, we focused on subjects with two DQ2.5 alleles, which are strongly predisposed to CeD (Supplementary Fig. S1B). A multivariable analysis restricted to patients without the diagnosis of CeD (Fig. 5A, Supplementary Fig. S1B) revealed that 19 Phecodes were overrepresented in individuals with two vs. 0-1 HLA-DQ2.5 alleles (Supplementary Table S1). The largest changes were observed in the categories "neoplasms" and "endocrine metabolic" (Fig. 5A). Significantly enriched Phecodes of malignant diseases comprised tumours of the respiratory system (OR = 1.6), "cancer of tongue" (OR = 2.6), "non-Hodgkin's lymphoma" (OR = 1.9), "large cell lymphoma" (OR = 2.0), and "cancer of other lymphatic, histiocytic tissue" (OR = 1.7). Among Phecodes encoding endocrine diseases, diabetes mellitus type 1 and its complications as well as thyroid disorders, including "hypothyroidism", "Graves´ disease, " and "thyrotoxicosis", were the most prominent (Supplementary Table S1). In addition, "chronic hepatitis" (OR = 3.8) and "nonproliferative glomerulonephritis" (OR = 3.0) were also significantly enriched in subjects with a genetic predisposition to, but without the concomitant diagnosis of, CeD (Fig. 5A, Supplementary Table S1). Out of the 19 Phecodes enriched in noncoeliac DQ2.5 homozygotes, seven were also uniquely enriched in CeD, i.e., were overrepresented in CeD subjects but not UC/CD individuals (Fig. 5B). These included diabetes and its complications, "Graves´ disease", "non-Hodgkin lymphoma, " and "cancer of lymphoid, histiocytic tissue". This finding prompted us to investigate whether the observed HLA-associated Phecodes were further affected by the presence of CeD. To that end, we compared HLA-DQ2.5 homozygous individuals with and without the diagnosis of CeD (Fig. 6, Supplementary Table S2). Eight Phecodes were significantly enriched in the former group, while none of them was less frequent. These included "type 1 diabetes" (OR = 3.1) together with In contrast, the frequency of respiratory tract cancer and thyrotoxicosis/Graves´ disease was similar in HLA-DQ2.5 homozygous subjects with and without the diagnosis of CeD (Fig. 6, Supplementary Table S2).
In conclusion, while some of the CeD-specific Phecodes were primarily driven by the HLA dosage, the majority might be HLA-independent or driven by a combination of genetic susceptibility and the presence of diagnosisrelated factors.

Discussion
Our study demonstrated that all three analysed intestinal disorders are associated with overall excess mortality, with CD conferring the highest hazard ratio. Although this was suggested previously 13,17,18 , the risks seen in the UKB cohort somewhat exceed the numbers reported in large meta-analyses. In all three diseases, cancer-and cardiovascular-related deaths accounted for the majority of the cases, and the specific death risks closely resembled the overall mortality for the corresponding intestinal disorder. While increased cancer-related mortality has been established for CD 18,19 , the data on other disorders and the association between intestinal inflammation and cardiovascular mortality are conflicting, and further studies are needed to clarify the discrepant observations 13,[19][20][21][22] . The excess digestive disease-related deaths that were particularly overrepresented in CD are well in line with published findings 13,[22][23][24] . They, however, accounted for only a minority of cases and did not explain the elevated overall death rate. In summary, our data suggest that the presence of chronic intestinal disease, irrespective of its www.nature.com/scientificreports/ aetiology, increases overall mortality and that the predisposition to the most common causes of death, i.e., cardiovascular and malignant diseases, plays a significant role. This is reminiscent of rheumatoid arthritis, another chronic inflammatory disorder that is associated with increased cardiovascular mortality 25 . The similarities of intestinal inflammatory disorders seen on the mortality level were confirmed in PheWAS analysis showing that ~ 50% of all associated Phecodes were shared by all three conditions, while only ~ 20% were specific for one of these three diseases. Autoimmune disorders were particularly prominent in CeD, which is compatible with previous reports 26,27 . The marked association between CeD and autoimmune disorders is likely related to CeD's most established immune pathomechanisms and the strongest HLA association 28 . In this respect, we clearly demonstrated that some of the CeD-specific disorders, such as type 1 diabetes, thyroid disorders including Graves´ disease, and non-Hodgkin lymphoma, are affected not only by the CeD itself but also by the associated genetic background. This is well in line with the association of type 1 diabetes, Graves´ disease, and non-Hodgkin lymphoma with specific HLA haplotypes reported in the literature [29][30][31] . Taken together, our findings both confirm and extend previous findings.
The major aim of our study was to shed some light on the interplay between genetic background and acquired factors in the development of CeD-related disorders. To this end, we specifically analysed the occurrence of HLA-DQ2.5-related Phecodes in HLA-DQ2.5 homozygous individuals with or without CeD. We demonstrated that eight out of 19 identified Phecodes were further enriched in homozygous HLA-DQ2.5 individuals with versus without CeD. This suggests that the presence of CeD-related factors promotes the formation of autoreactive immune cells and further amplifies genetic risks. This concept was previously reported in the literature and has been (among others) proposed to play a crucial role in the development of non-Hodgkin lymphoma, type 1 diabetes, and coeliac hepatitis [32][33][34] .
Intestinal disease may also increase the susceptibility to other inflammatory diseases, such as rheumatoid arthritis and psoriasis, as these diseases are associated with all three intestinal disorders. Notably, IBD, psoriasis and rheumatoid arthritis display alterations in similar inflammatory pathways involving, among others, Th17 cells 35,36 , and are targeted by comparable anti-inflammatory treatment strategies 4,5,37,38 . Although immunosuppressive drugs are not used for coeliac disease, the involvement of Th17 cells has been demonstrated 36,39 , and multiple reports describe an association between CeD and rheumatoid arthritis as well as psoriasis [40][41][42] . However, the observed association of all analysed intestinal disorders with rheumatoid arthritis might be confounded by the difficult discrimination from enteropathic arthritis 43 .
A major limitation of the study is its cohort design, which precludes the identification of causal relationships. Moreover, in the UK Biobank cohort, the diagnosis of CeD is based on ICD10 codes, and no histological data are available. The prevalence of CeD in the UK biobank is only approximately half of what would be expected from prevalence studies (~ 1%) 44 . This suggests that some cases remain undiagnosed and/or unreported, which is in line with previous population-based studies 45 . This can lead to the overrepresentation of more severe CeD cases and might be partly responsible for the high overall mortalities seen in our study. However, several wellperformed studies used the same approach to define participants with CeD and observed a similar performance as other cohort studies 46,47 . In contrast, the prevalence of CD/UC subjects meets/exceeds the rates reported in other studies, which suggests a satisfactory diagnostic rate. Another important limitation is that due to the complexity of CeD, we were not able to address all potential contributors, such as the role of further genetic The UK Biobank is linked to the national death register, which provided age at death and primary ICD-10 diagnosis that led to death for all participants who died during or prior to April 2021. The end of follow-up was defined as death or the end of hospital inpatient data collection in April 2021. Causes of death were grouped according to their ICD codes into malignancies (C00-C97), digestive diseases (K00-K93), cardiovascular diseases (I00-I99), and respiratory diseases (J00-J99).
For the first 50,000 subjects, genotyping was performed using the UK BiLEVE Axiom Array, while the Affymetrix UK Biobank Axiom Array was used for the remaining 450,000 participants. HLA-DQ2.5 status was determined using the SNP rs2187668.
The study was approved by the UKB Access Committee (Project #47527). All participants provided written consent for the study. The UK Biobank study has approval from the Northwest Multicentre Research-Ethics Committee. The manuscript is based solely on the analysis of pseudonymized data obtained from the UK Biobank  PheWAS analysis. Participants were assigned to the disease-specific subgroups based on their ICD-10 diagnoses. Subjects with more than one of the analysed conditions (i.e., CeD, UC, or CD) were excluded (n = 868). The remaining participants were used as the reference cohort ( Supplementary Fig. S1). ICD-10 diagnoses obtained from medical reports were collected for each subject, and duplicates were removed. To perform a PheWAS analysis, all ICD 10-codes were converted into Phecodes using the "PheWAS" R package 49,50 . Phecodes represent a high-throughput phenotyping tool used to rapidly define the case/control status of clinically meaningful diseases and conditions 48 . Using this package, a series of case-control tests were performed: (1) each analysed case group was generated by including patients with the corresponding Phecode; (2) individuals were assigned to the control group when they lacked the tested Phecode; and (3) to ensure statistical power, analysis was restricted to Phecodes with at least 200 cases 51 . Autoimmune Phecodes were identified using the official list of the American autoimmune association 52 .
Statistical analysis. All continuous variables are presented as the mean ± standard deviation. Categorical variables are displayed as absolute and relative frequencies. Odds and hazard ratios are presented with their corresponding 95% confidence intervals (CIs). Hazard ratios were calculated using Cox proportional hazard regression models. Mortality was depicted as deaths per 1000 person-years, which were calculated using the following formula: (number of deaths/total number of subjects)/mean survival * 1000. To test for independent associations, multivariable logistic regression was used. All multivariable analyses were adjusted for age, sex and body mass index (BMI). PheWAS analysis was performed using the "PheWAS" R package. Bonferroni correction was used to adjust for multiple testing, and differences were considered to be statistically significant when p < 0.05. The data were analysed using SPSS Statistics version 27 (IBM; Armonk, NY, USA) and visualized with Prism version 8 (GraphPad, La Jolla, CA, USA).

Data availability
The data underlying this article are part of the UK Biobank database (https:// www. ukbio bank. ac. uk/) and can be accessed after prior application. This research has been conducted under Application Number 47527. The occurrence of the highlighted Phecodes was compared in HLA-DQ2.5 homozygous individuals with vs. without the diagnosis of coeliac disease. Odds ratios (ORs) and the corresponding 95% confidence intervals are shown. Nonproliferative glomerulonephritis and tongue cancer were not included since they were not present in the group of CeD patients with HLA-DQ2.5 homozygosity. NOS not otherwise specified.